Method of, and apparatus for, planar audio tracking

ABSTRACT

A planar audio tracking system comprises a square array of four microphones (M 1 , M 2 , M 3 , M 4 ) arranged as first and second cross-dipole microphones and a virtually constructed monopole microphone. The signals from these microphones undergo directional pre-processing and the results are applied to a filtered sum beamformer (FSB) ( 32 ). The FSB identifies functions (h d  (0), h d  (π/2), and h m ) of the FSB which are representative of impulse responses from desired audio source(s) to the first and second cross-dipole and the monopole microphone, respectively. The functions of the first cross-dipole and the monopole microphones and the functions of the second cross-dipole and the monopole microphones are cross correlated to produce respective estimates (ψ c (l) and ψ s (l)) representative of the lag of the most dominant audio source. An angle-estimate ({circumflex over (φ)}) of the most dominant source is determined using the estimates of lag. Other embodiments of the tracking system may comprise 3 microphones arranged in a circular array and forming first and second cross-dipoles and a virtual monopole.

The present invention relates to a method of, and apparatus for, planaraudio tracking.

Techniques for tracking a source are known from the field of navigation,radar and sonar. One of the simplest source-tracking techniques employsa crossed dipole array—two dipole sensors centered at the same point andoriented at right angles.

Crossed dipoles have been used for radio direction finding since theearly days of radio. S. W. Davies “Bearing Accuracies for ArctanProcessing of Crossed Dipole Arrays” in Proc. OCEANS 1987, vol. 19.,September 1987, pp 351-356 states that for a crossed dipole array withone dipole orientated towards the north, signals proportional to thesine and cosine of the source bearing are obtained and an estimate ofthe source bearing, {circumflex over (φ)}, can be obtained through thearctan of the ratio of these components. If there is an additionalomnidirectional sensor located at the centre of the crossed dipolearray, then its output may be used for synchronous detection of the“sense” or sign of the sine and cosine outputs; this allows the use of afour quadrant inverse tangent function to obtain unambiguous bearingestimates. This article studies the properties of a bearing estimatorbased on time-averaged products of the omnidirectional sensor u_(o)(t),with north-south oriented (“cosine”) dipole output, u_(c)(t), andeast-west oriented (“sine”) dipole output, u_(s)(t).

U.S. Pat. No. 6,774,934 relates to camera positioning means used topoint a camera to a speaking person in a video conferencing system. Inorder to find the correct direction for a camera, the system is requiredto determine the position from which the sound is transmitted. This isdone by using at least two microphones receiving the speech signal andmeasuring the transmission delay between the signals received by themicrophones. The delay is determined by first determining the impulseresponses (h₁) and (h₂) and subsequently calculating a cross correlationfunction between these impulse responses. From the main peak in thecross correlation function, the delay value is determined. The describedsystem is satisfactory when the microphones are spaced sufficiently wideapart that a delay value can be determined.

A drawback of currently known audio tracking techniques is that thedominant reflection of the audio source (via walls and tables forexample) negatively influences the result of the audio-tracking.

An object of the present invention is to be able to derive a bearingfrom closely spaced microphones.

According to one aspect of the present invention there is provided amethod of planar audio tracking using at least three from which virtualfirst and second cross-dipole microphones and a virtual monopolemicrophone are constructed, the method comprising directionalpre-processing signals from the first and second cross-dipolemicrophones and the monopole microphone, filtering the results of thedirectional pre-processing of the signals, identifying functionsrepresentative of impulse responses from desired audio source(s) to thefirst and second cross-dipole and the monopole microphones,respectively, cross-correlating the functions of the first cross-dipoleand the monopole microphones and the functions of the secondcross-dipole and the monopole microphones to produce respectiveestimates representative of the lag of the most dominant audio source,and using the estimates representative of lag to determine anangle-estimate of the most dominant source.

According to another aspect of the present invention there is provided aplanar audio tracking apparatus comprising at least three from whichvirtual first and second cross-dipole microphones and a virtual monopolemicrophone are constructed, means for directional pre-processing signalsfrom the first and second cross-dipole microphones and the monopolemicrophone, means for filtering the results of the directionalpre-processing of the signals and identifying functions representativeof impulse responses from desired audio source(s) to the first andsecond cross-dipole and the monopole microphones, respectively,cross-correlating means for cross-correlating the functions of the firstcross-dipole and the monopole microphones and the functions of thesecond cross-dipole and the monopole microphones to produce respectiveestimates representative of the lag of the most dominant audio source,and means for using the estimates representative of lag to determine anangle-estimate of the most dominant source.

The present invention will now be described, by way of example, withreference to the accompanying drawings, wherein:

FIG. 1 shows an array of four microphones,

FIG. 2 is a block schematic diagram of an embodiment of an apparatusmade in accordance with the present invention,

FIG. 3 is a flow chart illustrating the method in accordance with thepresent invention,

FIG. 4 is a block schematic diagram of a filtered-sum beamformer (FSB),

FIG. 5 shows the geometry of a circular array of three equally spacedmicrophones, and

FIG. 6 shows diagrams of a monopole and two orthogonal dipoles.

In the drawings the same reference numerals have been used to representcorresponding features.

FIG. 1 shows a square array of four microphones M₁, M₂, M₃ and M₄arranged symmetrically of orthogonal axes passing through an origin 0,with the horizontal axis being the zero degree axis. The length of thesquare between adjacent corners is distance d and the spacing betweendiagonal arranged microphones M₁ (M₂) and M₃ (M₄) is exactly √2 times d.The diagonally arranged microphone pairs M₁, M₃ and M₂, M₄ form dipolepole pairs and the combined outputs of all four microphones form avirtually constructed monopole microphone. An audio source X is disposedat an azimuth angle φ relative to the zero degree axis.

The normalized (frequency independent) dipole-response is computed as:

E _(d) (φ,φ)=I _(ideal)·^(T) ·E _(d)(φ,φ)  (1)

where:

$\begin{matrix}{{{E_{d}\left( {\varphi,\phi} \right)} = {{{\cos \left( {\varphi + \frac{\pi}{4}} \right)} \cdot {E_{d}\left( {{{- \pi}/4},\phi} \right)}} + {{\sin \left( {\varphi + \frac{\pi}{4}} \right)} \cdot {E_{d}\left( {{\pi/4},\phi} \right)}}}},} & (2)\end{matrix}$

and where

$\begin{matrix}\begin{matrix}{{E_{d}\left( {{\pi/4},\phi} \right)} = {E_{2} - E_{4}}} \\{{= {S \cdot \left( {^{j \cdot \sqrt{2} \cdot \Omega \cdot {\cos {({\phi - \frac{\pi}{4}})}}} - ^{{- j} \cdot \sqrt{2} \cdot \Omega \cdot {\cos {({\phi - \frac{\pi}{4}})}}}} \right)}},} \\{= {j \cdot 2 \cdot S \cdot {\sin \left( {\sqrt{2} \cdot \Omega \cdot {\cos \left( {\phi - \frac{\pi}{4}} \right)}} \right)}}}\end{matrix} & (3) \\{{E_{d}\left( {{\pi/4},\phi} \right)} \approx {{j \cdot S \cdot 2}{\sqrt{2} \cdot \Omega \cdot {\cos \left( {\phi - \frac{\pi}{4}} \right)}}}} & (4) \\\begin{matrix}{{E_{d}\left( {{{- \pi}/4},\phi} \right)} = {E_{3} - E_{1}}} \\{{= {S \cdot \left( {^{j \cdot \sqrt{2} \cdot \Omega \cdot {\cos {({\phi + \frac{\pi}{4}})}}} - ^{{- j} \cdot \sqrt{2} \cdot \Omega \cdot {\cos {({\phi + \frac{\pi}{4}})}}}} \right)}},} \\{= {j \cdot 2 \cdot S \cdot {\sin \left( {\sqrt{2} \cdot \Omega \cdot {\cos \left( {\phi + \frac{\pi}{4}} \right)}} \right)}}}\end{matrix} & (5) \\{{E_{d}\left( {{{- \pi}/4},\phi} \right)} \approx {{j \cdot S \cdot 2}{\sqrt{2} \cdot \Omega \cdot {\cos \left( {\phi + \frac{\pi}{4}} \right)}}}} & (6)\end{matrix}$

with φ the angle of incidence of sound, φ the angle of the main-lobe ofthe superdirectional response, E_(i) the signal picked-up by each of themicrophonesM_(i), S the sensitivity of each of the microphones and Ωgiven by:

$\begin{matrix}{\Omega = \frac{\omega \cdot d}{2 \cdot c}} & (7)\end{matrix}$

with ω the frequency (in radians), d the distance between themicrophones and c the speed of sound.

The approximations for E_(d)(π/4,φ) and E_(d)(−π/4,φ) are valid forsmall values of Ω where the distance d is smaller than the wavelength λof the sound, where:

λ=2π/ω.

Furthermore I_(ideal) is an ideal integrator, defined as:

$\begin{matrix}{I_{ideal} = \frac{1}{j\omega}} & (8)\end{matrix}$

and T is an extra compensation term defined as

$\begin{matrix}{T = \frac{c}{\sqrt{2} \cdot d}} & (9)\end{matrix}$

The integrator is required to remove the jω-dependency in the dipoleresponse.

The normalized monopole response E_(m) (φ) is computed as:

$\begin{matrix}\begin{matrix}{{\overset{\_}{E_{m}}(\phi)} = {\frac{1}{4} \cdot {\sum\limits_{i = 1}^{4}\; E_{i}}}} \\{= {\frac{1}{4} \cdot S \cdot \begin{bmatrix}{^{j \cdot \sqrt{2} \cdot \Omega \cdot {\cos {({\phi - \frac{\pi}{4}})}}} + ^{{- j} \cdot \sqrt{2} \cdot \Omega \cdot {\cos {({\phi - \frac{\pi}{4}})}}} +} \\{^{j \cdot \sqrt{2} \cdot \Omega \cdot {\cos {({\phi + \frac{\pi}{4}})}}} + ^{{- j} \cdot \sqrt{2} \cdot \Omega \cdot {\cos {({\phi + \frac{\pi}{4}})}}}}\end{bmatrix}}} \\{= {\frac{1}{2} \cdot S \cdot \left\lbrack {{\cos \left( {\sqrt{2} \cdot \Omega \cdot {\cos\left( {\phi - \frac{\pi}{4}} \right)}} \right)} + {\cos \left( {\sqrt{2} \cdot \Omega \cdot {\cos\left( {\phi + \frac{\pi}{4}} \right)}} \right)}} \right\rbrack}}\end{matrix} & (10)\end{matrix}$

The overline indicates that the response has been normalized with amaximum response S (equal to the response of a single sensor).

The technique for audio tracking uses the signals of two orthogonaldipoles (or crossed dipoles) E_(d) (0,φ) and E_(d) (π/2,φ) incombination with the monopole E_(m) (φ) to compute two (time averaged)cross-correlation values as follows:

X _(c) =xcorr[ E _(d) (0,φ), E _(m) (φ)]≈cos {circumflex over (φ)}  (11)

and:

X _(s) =xcorr[ E _(d) (π/2,φ), E _(m) (φ)]≈sin {circumflex over(φ)}  (12)

which approximates the sine and the cosine values of the audio sourceangle φ.

An estimate of the angle of the audio source is now computed via thearctangent operation:

$\begin{matrix}{\hat{\varphi} = \left\{ \begin{matrix}{{\tan^{- 1}\left( \frac{X_{s}}{X_{c}} \right)}\mspace{14mu}} & {{{if}\text{:}\mspace{14mu} X_{c}} \geq 0} \\{{{{{sgn}\left( X_{s} \right)}\pi} + {\tan^{- 1}\left( \frac{X_{s}}{X_{c}} \right)}}\mspace{14mu}} & {{{if}\text{:}\mspace{14mu} X_{c}} < 0}\end{matrix} \right.} & (13)\end{matrix}$

The ambiguity of the arctangent can be resolved since the signs of thecosine and the sine estimates are available from equations (11) and(12).

It is noted that for bad signal-to-noise ratios, the estimate of theaudio-source angle will be degraded. For the extreme case of only (2D or3D) diffuse (that is isotropic) noise, it can be shown that thecross-dipoles and the monopoles are mutually uncorrelated and the valuesof X_(c) and X_(s) are uniformly distributed random variables. As aresult, the estimate φ will also behave as an uniform random variablebetween ±π.

In order to overcome the dominant reflections of the audio sourcenegatively influencing the result of the audio tracking, thecrossed-dipole signals E_(d) (0,φ) and E_(d) (π/2,φ) and the monopolesignal E_(m) (φ) are applied to a filtered-sum beamformer (FSB) to bedescribed with reference to FIG. 4. The FSB identifies functionsrepresenting an important part of the impulse responses from the audiosource to the (virtual) dipole and monopole microphones. These functionswill be denoted as h_(d)(0), h_(d)(π/2) and h_(m).

Instead of computing the cross-correlation between the signals of thecrossed-dipoles and the monopole as in equations (11) and (12), pairs offunctions identified by the FSB are cross-correlated:

ψ_(c) =xcorr[h _(d)(0),h _(m)],  (14)

and

ψ_(s) =xcorr[h _(d)(π/2),h _(m)].  (15)

The lag l in ψ_(c)(l) and ψ_(s)(l) which is representative for the mostdominant audio-source (other lags are representative for reflections) isfound by:

$\begin{matrix}{l = {\arg {\max\limits_{i}\left\{ {\left\lbrack {\psi_{c}(i)} \right\rbrack^{2} + \left\lbrack {\psi_{s}(i)} \right\rbrack^{2}} \right\}}}} & (16)\end{matrix}$

These cross-correlations for lag l approximates the sine and the cosineof the most dominant audio source coming from azimuth angle φ

ψ_(c)(l)≈cos {circumflex over (φ)},  (17)

and

ψ_(s)(l)≈sin {circumflex over (φ)}  (18)

The angle estimate {circumflex over (φ)} is now computed as:

$\begin{matrix}{\hat{\varphi} = \left\{ \begin{matrix}{{\tan^{- 1}\left( \frac{\psi_{s}(l)}{\psi_{c}(l)} \right)}\mspace{14mu}} & {{{if}\text{:}\mspace{14mu} {\psi_{c}(l)}} \geq 0} \\{{{{{sgn}\left( {\psi_{s}(l)} \right)}\pi} + {\tan^{- 1}\left( \frac{\psi_{s}(l)}{\psi_{c}(l)} \right)}}\mspace{14mu}} & {{{if}\text{:}\mspace{14mu} {\psi_{c}(l)}} < 0}\end{matrix} \right.} & (19)\end{matrix}$

It is noted that an efficient cross-correlation of two vectors can beimplemented via the Fast Fourier Transform.

Referring to FIG. 2, the microphones are formed into a closely spacedsquare array 10 with a distance d between adjacent microphones beingsmaller than the wavelength of the sound being detected. The microphonesare centered at the same point and the sound source has a bearing φ. Afirst cross-dipole is formed by coupling the outputs of microphones M₁and M₃ to respective inverting and non-inverting inputs of a summingstage 12. A second cross-dipole is formed by coupling the outputs ofmicrophones M₂ and M₄ to respective non-inverting and inverting inputsof a summing stage 14. The first and second cross-dipoles are orientedat right angles. Outputs of the summing stages 12, 14 are coupled torespective integrating stages 16, 18. The respective integrating stage16, 18 outputs are dipole −45 degrees (or −π/4 radians) and dipole +45degrees (or π/4 radians). These outputs are applied to respectiveamplifying stages 20, 22. The output from the amplifying stage 20 isapplied to an inverting input of a summing stage 24 and the output fromthe summing stage 22 is applied to a non-inverting input of the summingstage 24. The output of the summing stage 24 comprises a degree dipolecos(φ) The output from the amplifying stage 22 is applied to a firstnon-inverting input of a summing stage 26 and the output from thesumming stage 20 is applied to a second non-inverting input of thesumming stage 26. The output of the summing stage 24 comprises a 90degree dipole sin(φ).

A monopole signal is produced by connecting the microphones M₁ to M₄ toa summing stage 28, the output from which is applied to an attenuatingamplifier 30 having a gain of ¼.

A filtered sum beamforming stage (FSB) 32 has inputs 34, 36, 38 for thedipole 90 degree signal E_(d) (π2,φ), the dipole 0 degree signal E_(d)(0,φ) and the monopole signal E_(m) , respectively. The FSB 32identifies the functions h_(d)(π/2), h_(d)(0) and h_(m) of the FSB whichare representative of the impulse-responses from the desired audiosource(s) to the cross-dipole microphones and the (virtuallyconstructed) monopole microphone. In effect the FSB 32 separates thedominant audio source from the dominant reflective sources. Thefunctions h_(d)(π/2), h_(d)(0) and h_(m) are present on outputs 40, 42,44, respectively. The FSB 32 has a further output 46 for an outputsignal which is not used in the method in accordance with the presentinvention. The functions h_(d)(π/2) and h_(m) on the outputs 40, 44 areapplied to a first cross-correlator 48 which produces ψ_(s) inaccordance with equation (15) above. The functions in h_(d)(0) and h_(m)on the outputs 42, 44 are applied to a second cross-correlator 50 whichproduces ψ_(c) in accordance with equation (14) above. Outputs from thecross-correlators 48 and 50 are applied to an arctangent (or tan⁻¹)stage 52 which determines the lag representative of the most dominantaudio source (other lags being representative of reflections) andcomputes an angle-estimate {circumflex over (φ)} in accordance withequations (19) above.

The angle estimate {circumflex over (φ)} is derived in accordance to themethod illustrated in the flow chart shown in FIG. 3. Block 60represents obtaining normalised cross-dipole signals. Block 62represents obtaining a normalised monopole signal. Block 64 representsproducing the first and second dipole functions and h_(d)(0) and themonopole function h_(m) by filtering the normalised the first and secondcross-dipole signals and the monopole signal in the FSB. Block 66represents cross-correlating the first dipole function h_(d)(π/2) withthe monopole function h_(m) to produce ψ_(s). Block 68 representscross-correlating the second cross-dipole function h_(d)(0) with themonopole function h_(m) to produce ψ_(c). Block 70 representsdetermining the lag l representative of the most dominant audio source.Block 72 represents determining the sine and cosine of the lag l.Finally block 74 represents estimating the bearing {circumflex over (φ)}of the audio source by obtaining the arctangent of the sine and cosineof the lag l.

FIG. 4 is a block schematic diagram of a filtered-sum beamformer or FSBsuitable for use with the method in accordance with the presentinvention. The dipole 90 degree signal aV_(in), dipole 0 degree signalbV_(in) and the monopole signal cV_(in), where a, b and c are respectiveattenuation coefficients are applied to respective inputs 34, 36 and 38.The input 34 is connected to a filter 76 having a transfer function W₁,the input 36 is connected to a filter 78 having a transfer function W₂and the input 38 is connected to a filter 80 having a transfer functionW₃. The filters 76, 78 and 80 respectively produce processed signalsV_(P), V_(Q) and V_(R), each of which can be written (in the frequencydomain) as:

V _(P) =aV _(in) ·W ₁

V _(Q) =bV _(in) ·W ₂

V _(R) =cV _(in) ·W ₃

These signals are applied to a summing stage 82 which produces acombined signal:

V _(sum) =V _(P) +V _(Q) +V _(R) =aV _(in) ·W ₁ bV _(in) ·W ₂ cV _(in)·W ₃

V_(sum) appears on the output 46 and also is applied to three furtheradjustable filters 90, 92, and 94 which derive filtered combined signalsusing transfer functions W₁*, W₂* and W₃* which are the complexconjugates of W₁, W₂ and W₃, respectively.

The first filtered combined signal is equal to:

V _(FC1)=(aV _(in) ·W ₁ +bV _(in) ·W ₂ +cV _(in) ·W ₃)·W ₁*

The second filtered combined signal is equal to:

V _(FC2)=(aV _(in) ·W ₁ +bV _(in) ·W ₂ +cV _(in) ·W ₃)·W ₂*

The third filtered combined signal is equal to:

V _(FC3)=(aV _(in) ·W ₁ +bV _(in) ·W ₂ +cV _(in) ·W ₃)·W ₃*

A first difference measure between the signal a·V_(in) and the firstfiltered combined signal is determined by a subtractor 90. For theoutput signal of the subtractor 90 can be written:

$\begin{matrix}{V_{{DIFF}\; 1} = {{aV}_{in} - {\left( {{{aV}_{in}.W_{1}} + {{bV}_{in}.W_{2}} + {{cV}_{in}.W_{3}}} \right).W_{1}^{*}}}} \\{= {V_{in}\left( {a - {\left( {{aW}_{1} + {bW}_{2} + {cW}_{3}} \right).W_{1}^{*}}} \right)}}\end{matrix}$

A second difference measure between the signal V₂ and the secondfiltered combined signal is determined by a subtractor 92. For theoutput signal of the subtractor 92 can be written:

V _(DIFF2) =V _(in)(b−(a·W ₁ +b·W ₂ +c·W ₃)·W ₂*)

A third difference measure between the signal V₃ and the first filteredcombined signal is determined by a subtractor 94. For the output signalof the subtractor 94 can be written:

V _(DIFF3) =V _(in)(c−(a·W ₁ +b·W ₂ +c·W ₃)·W ₃*)

The arrangement according to FIG. 4 comprises control elements 96, 98and 100 for respectively adjusting the coefficients of the filters 76and 84, 78 and 86 and 80 and 94 to make the power of the respectiveoutput signals V_(DIFF1), V_(DIFF2) and V_(DIFF3) equal to zero. Inorder to find the values for W₁, W₂ and W₃ to make the differencesignals equal to zero the following equations have to be solved.

In order to facilitate an understanding of the process only two of thedifference equations will be considered.

a=(aW ₁ +bW ₂ +cW ₃)·W ₁*  (A)

b=(a·W ₁ +b·W ₂ +c·W ₃)·W ₂*  (B)

Eliminating the term (a·W₁+b·W₂ c·W₃) in equations (A) and (B) bydividing (A) by (B) results in:

$\begin{matrix}{\frac{W_{1}^{*}}{W_{2}^{*}} = {\left. \frac{a}{b}\Rightarrow W_{1}^{*} \right. = \frac{a \cdot W_{2}^{*}}{b}}} & (C)\end{matrix}$

By conjugating the left hand side and the right hand side of (C) for W₁:

$\begin{matrix}{\frac{W_{1}}{W_{2}} = {\left. \frac{a^{*}}{b^{*}}\Rightarrow W_{1} \right. = \frac{a^{*} \cdot W_{2}}{b^{*}}}} & (D)\end{matrix}$

Substituting (D) into (B) gives the following expression:

$\begin{matrix}{{\left( {\frac{{a}^{2} \cdot W_{2}}{H_{2}^{*}} + {H_{2} \cdot W_{2}}} \right) \cdot W_{2}^{*}} = b} & (E)\end{matrix}$

Rearranging (E) gives for |W₂|²;

$\begin{matrix}{{W_{2}}^{2} = \frac{{b}^{2}}{{a}^{2} + {b}^{2}}} & (F)\end{matrix}$

For |W|² can be found in the same way:

$\begin{matrix}{{W_{1}}^{2} = \frac{a^{2}}{{a}^{2} + {b}^{2}}} & (G)\end{matrix}$

From (F) and (G) it is clear that the value of |W₁|² increases when |a|²increases (or |b|² decreases) and that the value of |W₂|² increases when|b|² increases (or |a|² decreases). In such a way the strongest inputsignal is pronounced. This is of use to enhance a speech signal of aspeaker over background noise and reverberant components of the speechsignal without needing to know the frequency dependence of the pathsfrom the speaker to the microphones as was needed in prior artarrangements.

FIG. 5 illustrates a circular array of at least three (omni- oruni-directional microphone) sensors in a planar geometry and theapplication of signal processing techniques. With such an arrangement itis possible to construct a zeroth-order (that is monopole) response andtwo orthogonal first-order dipole responses.

Using uni-directional cardioid microphones has the main benefit, thatthe sensitivity for sensor-noise and sensor-mismatches is greatlyreduced for the construction of the first-order dipole responses.Additionally FIG. 5 shows the construction of a monopole and twoorthogonal dipoles by way of a uniformly spaced circular array of three,outwardly pointing, unidirectional cardioid microphones.

The responses of the three cardioid microphones is given by E_(c) ⁰,E_(c) ^(2π/3) and E_(c) ^(4π/3). Assuming that there is no uncorrelatedsensor-noise, the ith cardioid microphone response is ideally given by:

$\begin{matrix}{E_{c}^{2\; i\; {\pi/3}} = {\left\lbrack {\frac{1}{2} + {\frac{1}{2}{\cos \left( {\varphi - \frac{2\; i\; \pi}{3}} \right)}\sin \; \theta}} \right\rbrack ^{{j\psi}_{y}}}} & (20)\end{matrix}$

with:

$\begin{matrix}{\psi_{y} = {\frac{2\pi \; f}{c}\sin \; {\theta \left( {{p_{x}^{i}\cos \; \varphi} + {p_{y}^{i}\sin \; \varphi}} \right)}}} & (21)\end{matrix}$

where θ and φ are the standard spherical coordinate angles, that is,elevation and azimuth.

Using

$\begin{matrix}{p_{x}^{i} = {r\; {\cos \left( {\varphi - \frac{2\; i\; \pi}{3}} \right)}}} & (22)\end{matrix}$

and:

$\begin{matrix}{p_{y}^{i} = {r\; {\sin \left( {\varphi - \frac{2\; i\; \pi}{3}} \right)}}} & (23)\end{matrix}$

with r the radius of the circle we can write:

$\begin{matrix}{\psi_{y} = {\frac{2\pi \; f}{c}\sin \; \theta \; {\cos \left( \frac{2\; i\; \pi}{3} \right)}r}} & (24)\end{matrix}$

From the three cardioid microphones the following monopole andorthogonal dipoles can be constructed as:

$\begin{bmatrix}E_{m} \\E_{d}^{0} \\E_{d}^{\pi/2}\end{bmatrix} = {{{\frac{2}{3}\begin{bmatrix}1 & 1 & 1 \\2 & {- 1} & {- 1} \\0 & \sqrt{3} & {- \sqrt{3}}\end{bmatrix}}\begin{bmatrix}E_{c}^{0} \\E_{c}^{2{\pi/3}} \\E_{c}^{4{\pi/3}}\end{bmatrix}}.}$

For wavelengths larger than the size of the array, the responses of themonopole and the orthogonal dipoles are frequency invariant and ideallyequal to:

E _(m)=1  (25)

E _(d) ⁰(θ,φ)=cos φ sin θ  (26)

E _(d) ^(π/2)(θ,φ)=cos(φ−π/2)sin θ  (27)

The directivity patterns of these monopole and orthogonal dipoles areshown in FIG. 6.

The monopole response is referenced E_(m) and the orthogonal dipoleresponses are referenced E_(d) ^(π/2)(θ,φ) and E_(d) ^(π/2)(θ,φ).

In a non-illustrated embodiment the three uni-directional cardioidmicrophones are arranged unequally spaced on the periphery of a circle,for example with the apices forming a right-angled triangle.

In the present specification and claims the word “a” or “an” precedingan element does not exclude the presence of a plurality of suchelements. Further, the word “comprising” does not exclude the presenceof other elements or steps than those listed.

The use of any reference signs placed between parentheses in the claimsshall not be construed as limiting the scope of the claims.

From reading the present disclosure, other modifications will beapparent to persons skilled in the art. Such modifications may involveother features which are already known in the design, manufacture anduse of planar audio tracking systems and components therefor and whichmay be used instead of or in addition to features already describedherein.

1. A method of planar audio tracking using virtual first and secondcross-dipole microphones and a virtual monopole microphone, the methodcomprising: directional pre-processing signals from the first and secondcross-dipole microphones and the monopole microphone, filtering theresults of the directional pre-processing of the signals, identifyingfunctions representative of impulse responses from at least one desiredaudio source to the first and second cross-dipole and the monopolemicrophones, respectively, cross-correlating the functions of the firstcross-dipole and the monopole microphones and the functions of thesecond cross-dipole and the monopole microphones to produce respectiveestimates representative of a lag of a most dominant audio source, andusing the estimates representative of the lag to determine anangle-estimate of the most dominant source.
 2. A method as claimed inclaim 1, wherein the filtering of the results of directionalpreprocessing is carried out in a filtered-sum beamformer.
 3. A methodas claimed in claim 1, wherein the directional pre-processing of thesignals includes obtaining normalised first and second crossed dipolesignals and a normalised monopole signal.
 4. A method as claimed inclaim 1, wherein there are four microphones arranged as a square array.5. A method as claimed in claim 4, wherein a length of a side of thesquare array is less than a wavelength of the sound of interest.
 6. Amethod as claimed in claim 1, wherein the at least three microphones arecardioid microphones arranged as a circular array of equally spacedmicrophones.
 7. A planar audio tracking apparatus comprising: virtualfirst and second cross-dipole microphones, a virtual monopolemicrophone, a processor for directional pre-processing signals from thefirst and second cross-dipole microphones and the monopole microphone, afilter for filtering the results of the directional pre-processing ofthe signals and identifying functions representative of impulseresponses from at least one desired audio source to the first and secondcross-dipole and the monopole microphones, respectively, across-correlator for cross-correlating the functions of the firstcross-dipole and the monopole microphones and the functions of thesecond cross-dipole and the monopole microphones to produce respectiveestimates representative of a lag of a most dominant audio source, andan element for using the estimates representative of the lag todetermine an angle-estimate of the most dominant source.
 8. An apparatusas claimed in claim 7, wherein the filter for filtering of the resultsof directional preprocessing includes a filtered-sum beamformer.
 9. Anapparatus as claimed in claim 7, wherein the processor for directionalpre-processing of the signals includes an element for obtainingnormalised first and second crossed dipole signals and a normalisedmonopole signal.
 10. An apparatus as claimed in claim 7, wherein thereare four microphones arranged as a square array.
 11. An apparatus asclaimed in claim 10, wherein a length of a side of the square array isless than a wavelength of a sound of interest.
 12. An apparatus asclaimed in claim 7, wherein the at least three microphones are cardioidmicrophones arranged as a circular array of equally spaced microphones.